library(tidyverse)
read_csv('https://wegweisr.haim.it/Daten/breaking_bad_deaths.csv') |>
count(method, sort = TRUE) |>
head(n = 5)Sommersemester 2025
| Sitzung | Datum | Thema |
|---|---|---|
| 1 | 23.04.2025 | Einführung |
| 2 | 30.04.2025 | GLM Grundlagen |
| 3 | 07.05.2025 | Lineare Regression |
| 4 | 21.05.2025 | Mittelwertvergleiche |
| 5 | 28.05.2025 | Multiple Regression |
| 6 | 04.06.2025 | Modellannahmen |
| Sitzung | Datum | Thema |
|---|---|---|
| 7 | 11.06.2025 | Modellvorhersagen |
| 8 | 18.06.2025 | Moderationsanalyse I |
| 9 | 25.06.2025 | Moderationsanalyse II |
| 10 | 02.07.2025 | Logistische Regression |
| 11 | 09.07.2025 | Multilevel-Regression |
| 12 | 16.07.2025 | Abschluss |
Was macht dieser Code?
Was macht dieser Code?
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage.
Miles, J., & Shevlin, M. (2001). Applying regression and correlation: A guide for students and researchers. London: Sage.
Darlington, R. B., & Hayes, A. F. (2016). Regression analysis and linear models: Concepts, applications, and implementation. Guilford Publications.
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. (für Interessierte)
Interpretieren Sie die folgenden Analysen:
“Die Inferenzstatistik (d.h. schließende Statistik) beschäftigt sich mit der Frage, wie man aufgrund von Stichprobendaten auf Sachverhalte in einer zugrundeliegenden Population schließen kann.” (Eid et al., 2010, p. 191)
Die Mittelwerte der einzelnen Stichproben streuen um den wahren Populationsmittelwert von 170 = Standardfehler (SE).
SE = \(SD(x)/\sqrt(n-1)\), den wir anhand einer Stichprobe berechnen können, als Schätzer für die Streuung der Stichprobenmittelwerte.
SE auf Basis unserer ersten Stichprobe: SE = \(11/\sqrt(29)\) = 2.
Rot: Normalverteilungskurve mit Mittelwert und Standardfehler aus der ersten Stichprobe.
95%-Konfidenzintervall auf Basis unserer ersten Stichprobe (M und SE): 167.8 - 175.8
Je größer die Stichprobe (n), desto kleiner der Standardfehler (SE), d.h. desto enger das Konfidenzintervall. Es gilt aber immer, bei 95%-CI enthalten langfristig 5 von 100 Intervallen nicht den Populationswert.
p(Daten|H0)
p(Daten|H1): Die Wahrscheinlichkeit, die empirischen Daten zu beoachten, wenn die Alternativhypothese gilt.
p(H0|Daten): Die Wahrscheinlichkeit für die Richtigkeit der Nullhypothese im Lichte der Daten.
p(H1|Daten): Die Wahrscheinlichkeit für die Richtigkeit der Alternativhypothese im Licht der Daten.
Der p-Wert sagt also nichts über die Wahrscheinlichkeit der Null- oder Alternativhypothese!
außerdem:
Quelle: https://www.statisticssolutions.com
Inferenzstatistik ‚funktioniert’, weil…
Quelle: https://onishlab.colostate.edu/wp-content/uploads/2019/07/which_test_flowchart.png
In der klassischen Statistikausbildung (auch bei uns) als Rezeptesammlung:
Fokus auf Unterschieden und Spezifika statt auf Gemeinsamkeiten
Viele Verfahren sind aber mindestens funktional, oft auch mathematisch äquivalent!
There has been little attempt to understand the influence on children of branded products that appear in television programs and movies. A study exposed children of two different age groups (6–7 and 11–12) in classrooms to a brief film clip. Half of each class was shown a scene from Home Alone that shows Pepsi Cola being spilled during a meal. The other half was shown a similar clip from Home Alone but without branded products. All children were invited to help themselves from a choice of Pepsi or Coke at the outset of the individual interviews.
| id | pepsi_placement | pepsi_chosen |
|---|---|---|
| 49 | 1 | 0 |
| 54 | 1 | 0 |
| 19 | 1 | 1 |
| 6 | 1 | 1 |
| 52 | 1 | 0 |
| pepsi_chosen | no_placement | placement |
|---|---|---|
| 0 | 57 | 37 |
| 1 | 43 | 63 |
| Chi2(1) | p | Cramer’s V (adj.) | Cramers_v_adjusted CI |
|---|---|---|---|
| 4.14 | 0.042 | 0.17 | (0.00, 1.00) |
| Parameter1 | Parameter2 | r | 95% CI | p |
|---|---|---|---|---|
| pepsi_placement | pepsi_chosen | 0.20 | (0.01, 0.38) | 0.042 |
Alternative hypothesis: true correlation is not equal to 0
| Parameter1 | Parameter2 | tau | z | p |
|---|---|---|---|---|
| pepsi_placement | pepsi_chosen | 0.20 | 2.03 | 0.043 |
Alternative hypothesis: true tau is not equal to 0
| Difference | 95% CI | t(103) | p | d |
|---|---|---|---|---|
| -0.20 | (-0.39, -0.01) | -2.06 | 0.042 | -0.41 |
| Parameter | Sum_Squares | df | Mean_Square | F | p | Eta2 |
|---|---|---|---|---|---|---|
| pepsi_placement | 1.03 | 1 | 1.03 | 4.23 | 0.042 | 0.04 |
| Residuals | 25.10 | 103 | 0.24 |
“The only formula you’ll ever need.” Andy Field
\[ outcome_i = Model_i + error_i \]
Frage: Wenn wir nur einen Schätzwert \(a\) für \(Y\) haben, welcher ist der beste Schätzer?
\[ Y_i = a + \epsilon_i \]
Antwort: Mittelwert \(\bar{x}\) als der beste Modellkoeffizient im Nullmodell
Problem: damit erklärt das Modell aber nichts, es fehlt eine Prädiktorvariable \(X\)
\[ Y_i = b_0 + b_1 X_i + \epsilon_i \]
\[ Y_i = b_0 + b_1 X_1 + + b_2 X_2 + b_3 X_3 + ... + \epsilon_i \]
| Parameter | Coefficient | 95% CI | t(103) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 0.43 | (0.29, 0.57) | 6.24 | < .001 | 0.00 | |
| pepsi placement | 0.20 | (0.01, 0.39) | 2.06 | 0.042 | 0.20 | |
| AICc | 153.96 | |||||
| R2 | 0.04 | |||||
| R2 (adj.) | 0.03 | |||||
| Sigma | 0.49 |
Quelle: Scharkow, Festl, Vogelgesang & Quandt, 2013
Quelle: https://www.cjr.org/tow_center_reports/the_curious_journalists_guide_to_data.php
| tv_time | age | games_time | music_time |
|---|---|---|---|
| 0.0 | 22 | 0 | 4.00 |
| 0.0 | 43 | 0 | 2.50 |
| 2.0 | 38 | 0 | 0.17 |
| 5.0 | 30 | 0 | 2.00 |
| 1.5 | 29 | 1 | 0.75 |
| 2.0 | 57 | 0 | 0.00 |
| Variable | Summary |
|---|---|
| Mean tv_time (SD) | 2.73 (3.67) |
| Mean age (SD) | 46.95 (14.67) |
| Mean games_time (SD) | 0.93 (2.41) |
| Mean music_time (SD) | 2.14 (2.75) |
| Parameter1 | Parameter2 | r | 95% CI | p |
|---|---|---|---|---|
| age | music_time | -0.22 | (-0.26, -0.17) | < .001 |
Alternative hypothesis: true correlation is not equal to 0
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 4.04 | (3.65, 4.42) | 20.53 | < .001 | 0.00 | |
| age | -0.04 | (-0.05, -0.03) | -10.14 | < .001 | -0.22 | |
| AICc | 10125.19 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 2.68 |
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 3.31 | (3.06, 3.57) | 25.49 | < .001 | 0.00 | |
| age18 | -0.04 | (-0.05, -0.03) | -10.14 | < .001 | -0.22 | |
| AICc | 10125.19 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 2.68 |
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 2.14 | (2.03, 2.25) | 36.56 | < .001 | 0.00 | |
| age centered | -0.04 | (-0.05, -0.03) | -10.14 | < .001 | -0.22 | |
| AICc | 10125.19 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 2.68 |
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 128.40 | (121.51, 135.28) | 36.56 | < .001 | 0.00 | |
| age centered | -2.43 | (-2.90, -1.96) | -10.14 | < .001 | -0.22 | |
| AICc | 27346.01 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 161.06 |
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 0.00 | (-0.04, 0.04) | 0.08 | 0.936 | 0.00 | |
| age zstd | -0.22 | (-0.26, -0.17) | -10.14 | < .001 | -0.22 | |
| AICc | 5872.65 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 0.98 |
| Parameter | Coefficient | 95% CI | t(2101) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 3.91 | (3.39, 4.43) | 14.63 | < .001 | 0.00 | |
| age | -0.03 | (-0.04, -0.01) | -4.62 | < .001 | -0.10 | |
| AICc | 11414.93 | |||||
| R2 | 0.01 | |||||
| R2 (adj.) | 0.01 | |||||
| Sigma | 3.65 |
| Zugehörigkeit | Gruppe B | Gruppe C | Gruppe D |
|---|---|---|---|
| Gruppe A | 0 | 0 | 0 |
| Gruppe B | 1 | 0 | 0 |
| Gruppe C | 0 | 1 | 0 |
| Gruppe D | 0 | 0 | 1 |
| Zugehörigkeit | Gruppe A | Gruppe B | Gruppe C |
|---|---|---|---|
| Gruppe D | 0 | 0 | 0 |
| Gruppe A | 1 | 0 | 0 |
| Gruppe B | 0 | 1 | 0 |
| Gruppe C | 0 | 0 | 1 |
Coming across news on social network sites (SNS) largely depends on news-related activities in one’s network. Although there are many different ways to stumble upon news, limited research has been conducted on how distinct news curation practices influence users’ intention to consume encountered content. In this mixed-methods investigation, using Facebook as an example, we first examine the results of an experiment (study 1, n = 524), showing that getting tagged in comments to news posts promotes news consumption the most.
| modus | rw | modus_tag |
|---|---|---|
| Tag | 5 | 1 |
| Chronik | 2 | 0 |
| Post | 3 | 0 |
| DM | 1 | 0 |
| Chronik | 1 | 0 |
| Chronik | 2 | 0 |
| Variable | Summary |
|---|---|
| Mean rw (SD) | 3.04 (1.30) |
| modus | n | M | SD |
|---|---|---|---|
| Chronik | 141 | 2.88 | 1.20 |
| Post | 97 | 2.79 | 1.25 |
| Tag | 152 | 3.51 | 1.33 |
| DM | 134 | 2.84 | 1.28 |
| Difference | 95% CI | t(522) | p | d |
|---|---|---|---|---|
| -0.67 | (-0.91, -0.43) | -5.51 | < .001 | -0.48 |
| Parameter | Coefficient | 95% CI | t(522) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 2.84 | (2.71, 2.97) | 43.26 | < .001 | 0.00 | |
| modus tag | 0.67 | (0.43, 0.91) | 5.51 | < .001 | 0.23 | |
| AICc | 1738.89 | |||||
| R2 | 0.05 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 1.27 |
| Parameter | Sum_Squares | df | Mean_Square | F | p | Eta2 |
|---|---|---|---|---|---|---|
| modus | 49.12 | 3 | 16.37 | 10.17 | < .001 | 0.06 |
| Residuals | 837.19 | 520 | 1.61 |
| Parameter | Coefficient | 95% CI | t(520) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 2.88 | (2.67, 3.09) | 26.95 | < .001 | -0.12 | |
| modus (Post) | -0.09 | (-0.41, 0.24) | -0.51 | 0.609 | -0.07 | |
| modus (Tag) | 0.63 | (0.34, 0.93) | 4.27 | < .001 | 0.49 | |
| modus (DM) | -0.04 | (-0.34, 0.26) | -0.28 | 0.776 | -0.03 | |
| AICc | 1742.69 | |||||
| R2 | 0.06 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 1.27 |
| Parameter | Coefficient | 95% CI | t(520) | p | Std. Coef. | Fit |
|---|---|---|---|---|---|---|
| (Intercept) | 2.84 | (2.62, 3.05) | 25.87 | < .001 | -0.15 | |
| modus dm (Chronik) | 0.04 | (-0.26, 0.34) | 0.28 | 0.776 | 0.03 | |
| modus dm (Post) | -0.04 | (-0.37, 0.29) | -0.25 | 0.804 | -0.03 | |
| modus dm (Tag) | 0.68 | (0.38, 0.97) | 4.50 | < .001 | 0.52 | |
| AICc | 1742.69 | |||||
| R2 | 0.06 | |||||
| R2 (adj.) | 0.05 | |||||
| Sigma | 1.27 |
| term | contrast | estimate | std.error | statistic | p.value | conf.low | conf.high | p_adjusted |
|---|---|---|---|---|---|---|---|---|
| modus | DM - Chronik | -0.04 | 0.15 | -0.28 | 0.78 | -0.34 | 0.26 | 1 |
| modus | DM - Post | 0.04 | 0.17 | 0.25 | 0.80 | -0.29 | 0.37 | 1 |
| modus | DM - Tag | -0.68 | 0.15 | -4.50 | 0.00 | -0.97 | -0.38 | 0 |
| modus | Post - Chronik | -0.09 | 0.17 | -0.51 | 0.61 | -0.41 | 0.24 | 1 |
| modus | Tag - Chronik | 0.63 | 0.15 | 4.27 | 0.00 | 0.34 | 0.92 | 0 |
| modus | Tag - Post | 0.72 | 0.16 | 4.36 | 0.00 | 0.40 | 1.04 | 0 |
| modus | estimate | std.error | statistic | p.value | conf.low | conf.high | df |
|---|---|---|---|---|---|---|---|
| Chronik | 2.88 | 0.11 | 26.95 | 0 | 2.67 | 3.09 | Inf |
| Post | 2.79 | 0.13 | 21.69 | 0 | 2.54 | 3.05 | Inf |
| Tag | 3.51 | 0.10 | 34.14 | 0 | 3.31 | 3.71 | Inf |
| DM | 2.84 | 0.11 | 25.87 | 0 | 2.62 | 3.05 | Inf |
Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.
Davis, M. J. (2010). Contrast coding in multiple regression analysis: Strengths, weaknesses, and utility of popular coding structures. Journal of data science, 8(1), 61-73.
Kümpel, A. S. (2019). Getting tagged, getting involved with news? A mixed-methods investigation of the effects and motives of news-related tagging activities on social network sites. Journal of Communication, 69(4), 373-395.
Wir vergleichen die Tanzbarkeit (danceability) und musikalische Stimmung (valence) der Top 10-Hits über 4 Dekaden (1990er bis 2020er) auf Basis von Billboard und Spotify-Daten.
Beide Variablen sind von 0 (niedrig) - 100 (hoch) skaliert. Die Mittelwerte und Fallzahlen pro Dekade sind wie folgt:
| decade | danceability | valence | n |
|---|---|---|---|
| 1990s | 64.72 | 56.09 | 588 |
| 2000s | 67.34 | 57.98 | 558 |
| 2010s | 67.31 | 51.93 | 499 |
| 2020s | 66.10 | 51.38 | 69 |
Interpretieren sie die Ergebnisse der beiden linearen Modelle, in denen die Mittelwertunterschiede getestet werden, Zeile für Zeile.
Welche Dekaden werden nicht miteinander verglichen, d.h. für diese bräuchten wir Post-Hoc Vergleiche?
Lösung bitte bis 04.06.2025, 10 Uhr in Moodle eintragen.
| danceability | |||
| Predictors | Coefficient (B) | SE (B) | p |
| (Intercept) | 64.72 | 0.59 | <0.001 |
| decade [2000s] | 2.62 | 0.85 | 0.002 |
| decade [2010s] | 2.59 | 0.88 | 0.003 |
| decade [2020s] | 1.38 | 1.83 | 0.452 |
| valence | ||
| Predictors | Coefficient (B) | 95% CI (B) |
| (Intercept) | 51.38 | 45.97 – 56.79 |
| decade [1990s] | 4.71 | -1.01 – 10.43 |
| decade [2000s] | 6.60 | 0.87 – 12.34 |
| decade [2010s] | 0.56 | -5.22 – 6.33 |